*Input Data

*Start with make dataset, which compromises informations about the brands included in the data
import delimited "$root/Data/Original/make.txt", clear
keep v4 v6 v7
rename v4 MAKNatCode
rename v6 MAKRecDate 
rename v7 MAKName

save "$root/Data/Original/make.dta", replace

*Next we take the module MODEL
import delimited "$root/Data/Original/model.txt", clear
keep v4 v6 v7 v9 v10 v11 v12 v13 v16 v17 v18 v19
rename v4 MODNatCode
rename v6 MODRecDate
rename v7 MAKNatCode
rename v9 MODName
rename v10 MODName2
rename v11 MODNameGrp1
rename v12 MODNameGrp2
rename v13 MODModelSerCode
rename v16 MODImpBegin
rename v17 MODImpEnd
rename v18 MODSuccessor
rename v19 MODPrev

save "$root/Data/Original/model.dta", replace

*Next the module PRICEHISTORY which comprises actual and historical prices:

import delimited "$root/Data/Original/pricehistory.txt", clear

keep v3 v6 v7 v12 v21
rename v3 car_ID
rename v6 PRHrecDate
rename v7 PRHnewprice
rename v12 PRHstartDate
rename v21 PRHendDate

*we assume validity until today if validity date is missing --> this means there was no more price update, could be that car is no longer sold, but that does not matter, cause we only use date as merging variable anyways.
replace PRHendDate=20191231 if PRHendDate==.

*convert startDate and endDate into Stata dates
tostring PRHstartDate, replace
tostring PRHendDate, replace


gen startDate=date(PRHstartDate, "YMD")
gen endDate=date(PRHendDate, "YMD")

format startDate %td
format endDate %td

gen startDate_m=mofd(startDate)
gen endDate_m=mofd(endDate)

format startDate_m %tm
format endDate_m %tm


gen startYear=year(startDate)
gen endYear=year(endDate)


duplicates tag car_ID, gen(dup)


*generate a panel out of the data

*generate a variable that numerates duplicates:
quietly bys car_ID (startDate_m):  gen duplicates = cond(_N==1,0,_n)

*Generate an ID variable for each observation
egen group_id=group(car_ID)
egen idS=concat(group_id duplicates), p(+)
egen id=group(idS)

expand endDate_m - startDate_m + 1
by id, sort: gen time = startDate_m + _n - 1
by id (time), sort: gen valid = (_n <= _N)

format time %tm

*drop all the help variables
keep PRH* car_ID time

sort car_ID time
duplicates drop car_ID time, force

save "$root/Data/Original/pricehist.dta", replace

*Lastly we also need the type data
import delimited "$root/Data/Original/type.txt", clear
keep v3 v6 v7 v8 v10 v12 v13 v16 v17 v21 v25 v26 v30 v37 v40 v45 v73 v78 v79
rename v3 car_ID
rename v6 TYPName
rename v7 TYPName2
rename v8 TYPSeg1
rename v10 TYPSeg2
rename v12 MODNatCode
rename v13 MAKNatCode
rename v16 TYPImpStart
rename v17 TYPImpEnd
rename v21	TYPKW
rename v25 TYPFuel
rename v26 TYPBody
rename v30 TYPCylinder
rename v37 TYPPollutNorm
rename v40 TYPDriveType
rename v45 TYPWeight
rename v73 TYPConsRating
rename v78 TYPSecFuel
rename v79 TYPSecKW

**Add some additional information:

replace TYPImpEnd=201912 if TYPImpEnd==.
tostring TYPImpStart, replace
tostring TYPImpEnd, replace

gen IMPStart=date(TYPImpStart, "YM")
gen ImpStart=mofd(IMPStart)
format ImpStart %tm

gen IMPEnd=date(TYPImpEnd, "YM")
gen ImpEnd=mofd(IMPEnd)
format ImpEnd %tm

drop TYPImpStart TYPImpEnd IMP*


save "$root/Data/Original/type.dta", replace


**Merge all the informations together
merge m:1 MAKNatCode using "$root/Data/Original/make.dta"
drop _merge MAKRecDate
merge m:1 MODNatCode using "$root/Data/Original/model.dta"
drop MODPrev MODSuccessor _merge MODImpBegin MODImpEnd MODRecDate
save "$root/Data/Original/type_model_make.dta", replace
merge 1:m car_ID using "$root/Data/Original/pricehist.dta"
drop PRHendDate PRHstartDate PRHrecDate _merge

save "$root/Data/Original/carprices.dta", replace









